Distinction between Machine Printed Text and Handwritten Text in a Document

نویسندگان

  • Rajesh Pathak
  • Ravi Kumar Tewari
چکیده

In many documents machine printed& handwritten texts are intermixed .Optical Character Recognition (OCR) techniques are different for machine printed and handwritten text, so it is necessary to separate these text before giving input to the OCR. In this paper we are proposing methodology for Hindi language. This methodology is based on structural features of text. Experimental results on a data set show that the proposed approach achieves an overall accuracy of 95.1%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Distinction between handwritten and machine-printed text based on the bag of visual words model

In a variety of documents, ranging from forms to archive documents and books with annotations, machine printed and handwritten text may coexist in the same document image, raising significant issues within the recognition pipeline. It is, therefore, necessary to separate the two types of text so that it becomes feasible to apply different recognition methodologies to each modality. In this pape...

متن کامل

Classification of Printed and Handwritten Text: a Review

Separating handwritten and machine printed text from a document has many applications. Various types of documents like bank cheques and forms etc. are used in daily life which contains both handwritten as well as printed text. It is necessary to separate handwritten and machine printed text before processing it with optical character recognition system. Various strategies are used to discrimina...

متن کامل

Shape codebook based handwritten and machine printed text zone extraction

In this paper, we present a novel method for extracting handwritten and printed text zones from noisy document images with mixed content. We use Triple-Adjacent-Segment (TAS) based features which encode local shape characteristics of text in a consistent manner. We first construct two codebooks of the shape features extracted from a set of handwritten and printed text documents respectively. We...

متن کامل

Ocr-optical Character Recognition

Optical Character Recognition or OCR is the electronic translation of handwritten, typewritten or printed text into machine translated images. It is widely used to recognize and search text from electronic documents or to publish the text on a website. OCR is the machine replication of human reading and has been the subject of intensive research for more than three decades. OCR can be described...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015